Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

This is part of the Self Driving Car Nanodegree by Udacity.

The goal is to classify german traffic sign images with a deep learning model.

In this notebook I will give an overview of the data. Then images are normalized, so the minimum value of each image is 0 and the maximum value 255. Also the amount of images is increased by image augmentation techniques. These are changes in brightness (though they are normalized afterwards), rotation, translation, scaling and sheering.

The deep learning model itself has the same architecture as the LeNet-5, but the filter depth got increased as the images are in colour and therfore the input "depth" is bigger. As a result of the larger depth the fully connected layer has more neurons. I also added a dropout layer, as this net seems like overfitting the training examples. Without augmentation it reaches 100% classification on the test set quite early. Even with augmentation the net is overfitting.

This net reached a validation accuracy of 94.1% and a test accuracy of 95.5%. It is overfitting, but im running out of time to tweak the net even further.


Step 0: Load The Data

Download and extract the file:

In [1]:
import tensorflow as tf
import pandas as pd
In [2]:
import urllib.request
import zipfile
import os.path
from os import makedirs

url = "https://d17h27t6h515a5.cloudfront.net/topher/2017/February/5898cd6f_traffic-signs-data/traffic-signs-data.zip" 
local_file = "./data/traffic-signs-data.zip"

if not os.path.exists("./data"):
    os.makedirs("data")

if not os.path.exists(local_file):
    urllib.request.urlretrieve(url, local_file)
    print("Downloaded file")
    zip_ref = zipfile.ZipFile(local_file, 'r')
    zip_ref.extractall("./data/traffic-signs")
    zip_ref.close()
    print("Extracted file")
else:
    print("File already exists")
File already exists
In [3]:
# Load pickled data
import pickle

base_data_path = "./data/traffic-signs/"

training_file = base_data_path + "train.p"
validation_file= base_data_path + "test.p"
testing_file = base_data_path + "valid.p"

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(validation_file, mode='rb') as f:
    valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']
In [4]:
class_names = pd.read_csv("./signnames.csv")

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 1D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Basic Summary of the Data Set

In [5]:
import numpy as np
In [6]:
### Replace each question mark with the appropriate value. 
### Use python, pandas or numpy methods rather than hard coding the results

# SOLUTION: Number of training examples
n_train = y_train.shape[0]

# SOLUTION: Number of validation examples
n_validation = y_valid.shape[0]

# SOLUTION: Number of testing examples.
n_test = y_test.shape[0]

# SOLUTION: What's the shape of an traffic sign image?
image_shape = X_train[0].shape

# SOLUTION: How many unique classes/labels there are in the dataset.
n_classes = len(np.unique(y_train))

print("Number of training examples =", n_train)
print("Number of validation examples =", n_validation)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 34799
Number of validation examples = 12630
Number of testing examples = 4410
Image data shape = (32, 32, 3)
Number of classes = 43

Exploratory visualization of the dataset

In [7]:
import matplotlib.pyplot as plt
%matplotlib inline

Lets take a look at the class distributions per set.

In [8]:
import pandas as pd
def class_histogram(labels, title_postfix = ""):
    pd.DataFrame(labels).hist(bins = n_classes, grid = False)
    plt.title("Distribution of Labels" + title_postfix)
    plt.xlabel("Class Number")
In [9]:
class_histogram(y_train, " in Train Data")
class_histogram(y_valid, " in Validation Data")
class_histogram(y_test, " in Test Data")

Class distribution is pretty much the same on each set, though there are more prominent classes than others. To improve results further, one could rebalance the class distribution with augmented images.

Now lets take a look at some images of each class as they are ordered in the dataset.

In [10]:
def plot_classes(images_per_class, labels, images, shuffle_in_class = False):
    classes = pd.DataFrame({"Class": (labels)}).groupby("Class")
    number_of_classes = len(np.unique(labels))
    for group in classes:
        figure_number = 1
        for column in range(images_per_class):
            class_number = group[0]
            if shuffle_in_class:
                elements = group[1].sample(images_per_class).index
            else:
                elements = group[1].head(images_per_class).index
            
            plt.subplot(1, images_per_class, figure_number)
            plt.imshow(images[elements[column]])
            plt.axis('off')
            if(figure_number == int((images_per_class + 1) / 2)):
                 plt.title("Class: " + str(class_names.loc[class_number,"SignName"]) + ", Images: " +  str(group[1].shape[0]))
            
            figure_number += 1
       
        plt.show()
In [11]:
plot_classes(5, y_train, X_train)

The class distribution seem to be according to the distribution of the signs you see on german streets. For example class one, a speed limit of 20 km/h is very rare to see, as usally the speed limit in residential areas is 30 km/h. So there are only 180 images of this sign. There are up to 2000 images for the "popular" signs.

As far as I know the images were taken from a car and multiple images of each sign were taken as the car moved by. Therfore the dateset contains the same image with different alterations in the angle or in brightness. Lets take a look at random pictures per class to see some different ones.

In [12]:
plot_classes(5, y_train, X_train, shuffle_in_class=True)

We will also take a look at the images in the validation and test set.

In [13]:
plot_classes(5, y_valid, X_valid, shuffle_in_class=True)
In [14]:
plot_classes(5, y_test, X_test, shuffle_in_class=True)

Step 2: Model Architecture

Pre-processing

For preprocessing the images are normalized, so every image uses the full brightness spectrum from value 0 to value 255. This improves images that are very dark, but also adds shifts in colour. One could use better techniques to normalize the brightness, as converting in the HSL colorspace and normalize brighntess there.

Normalizing the images improved the accuracy of the net.

For augmentation a random brightness is added and then the images get normalized. Brightness is added before normalizing, to emulate different conditions when the picture was taken. Also random translations, rotations, sheering and scaling was added. These augmentation techniques are used quite often with image recognition tasks and should improve overfitting.

At last the image values where converted to a -1.0 to 1.0 range for deep learning.

In [15]:
def to_float_image(images):
    """
    Converts images in the range 0 - 255 to the range 0.0 - 1.0.
    This can still be plotted with matplotlib.
    """
    return images.astype(np.float32) / 255.0

This type of normalization changes the colors of the images a lot when the images are dark, as every colorchannel gets multiplied by a constant factor. To improve on this, one could convert the image to the HSL colorspace, adjust the lightnes and convert back to rgb.

In [16]:
def normalize(images):
    """
    The pixel values of a normalized image range from 0 to 255. 
    So if an images darkest pixel is 10 and brightest pixel is 200,
    after normalzing they are 0 and 255. All other pixel values are 
    scaled accordingly.
    """
    result = np.empty(images.shape)
    for i in range(len(images)):
        image = images[i].astype(np.float32)
        image -= image.min()
        image *= (255.0/image.max())
        result[i] = image
    return result.astype(np.ubyte)
In [17]:
def float_image_to_deep_learning_input(images):
    """
    Input should have values in the range 0.0 - 1.0.
    Returns images in the range -1.0 - 1.0.
    This can not be plotted with matplotlib.
    """
    return (images - 0.5) * 2
In [18]:
from skimage import transform as itrf
def augment_image(image):
    """
    Adds random brightnes, shearing, rotation, translation and scaling to the image.
    """
    random_brightnes = np.clip(np.random.normal(loc = 1.0, scale = 0.5), 0.1, 3.0)
    image = np.clip((image * random_brightnes),0, 255).astype(np.ubyte)
    image = normalize(image)
    
    random_shear = np.random.normal(loc = 0.0, scale = 0.1)
    random_scale = (np.random.normal(loc = 1.0, scale = 0.1), np.random.normal(loc = 1.0, scale = 0.1))
    random_rotation = np.random.normal(loc = 0.0, scale = 0.1)
    random_translation = (np.random.normal(loc = 0.0, scale = 0.1), np.random.normal(loc = 0.0, scale = 0.1))
    afine_tf = itrf.AffineTransform(shear = random_shear, scale = random_scale, rotation = random_rotation)
    # print("Shear: " +  str(random_shear) + " Scale: " +  str(random_scale) + " Rot: " + str(random_rotation) + " Bright: " + str(random_brightnes))
    return itrf.warp(image, inverse_map=afine_tf)

Example of one image augmentation. (Execute multiple times to see the randomness)

In [19]:
plt.imshow(augment_image(X_train[100]))
Out[19]:
<matplotlib.image.AxesImage at 0x7ff9225c0a58>

Here I will show the results of different augmentation steps:

Original Image:

In [20]:
plt.imshow(X_train[100])
Out[20]:
<matplotlib.image.AxesImage at 0x7ff92252c978>

Adding random brightness:

In [21]:
random_brightnes = np.clip(np.random.normal(loc = 1.0, scale = 0.5), 0.1, 3.0)
image = np.clip((X_train[100] * random_brightnes),0, 255).astype(np.ubyte)
plt.imshow(image)
Out[21]:
<matplotlib.image.AxesImage at 0x7ff922518668>

Normalizing:

In [22]:
image = normalize(image)
plt.imshow(image)
Out[22]:
<matplotlib.image.AxesImage at 0x7ff935804828>

Random rotation, scale, sheer and translation. This is done at once by multipling the image with a warp matrix.

In [23]:
random_shear = np.random.normal(loc = 0.0, scale = 0.1)
random_scale = (np.random.normal(loc = 1.0, scale = 0.1), np.random.normal(loc = 1.0, scale = 0.1))
random_rotation = np.random.normal(loc = 0.0, scale = 0.1)
random_translation = (np.random.normal(loc = 0.0, scale = 0.1), np.random.normal(loc = 0.0, scale = 0.1))
afine_tf = itrf.AffineTransform(shear = random_shear, scale = random_scale, rotation = random_rotation)
image = itrf.warp(image, inverse_map=afine_tf)
plt.imshow(image)
Out[23]:
<matplotlib.image.AxesImage at 0x7ff9359a66a0>
In [24]:
def image_augmentation_pipeline(images):
    result = np.zeros(images.shape)
    for i in range(len(images)):
        result[i] = augment_image(images[i])
    
    return result
In [25]:
def prepare_images_for_deep_learning(images):
    images = normalize(images)
    images = to_float_image(images)
    images = float_image_to_deep_learning_input(images)
    return images

Augmentation

Here the augmentation of images get calculated. I tried to generate the augmented images during the training loop to add randomness and save memory, but that increased the training time significantly, which i did not prefer. I wanted to observe the improvements of the net while training so I could stop training early on bad results. Therefore I needed a fast training pipeline. So the images are generated before training.

In [26]:
%%time
AUGMENTATION_RUNS = 5
X_train_augmented =  np.zeros((AUGMENTATION_RUNS,) + X_train.shape)
for i in range(AUGMENTATION_RUNS):
    X_train_augmented[i] = image_augmentation_pipeline(X_train)
    
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/ipykernel_launcher.py:12: RuntimeWarning: divide by zero encountered in true_divide
  if sys.path[0] == '':
/home/carnd/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in multiply
  if sys.path[0] == '':
CPU times: user 3min 24s, sys: 1.34 s, total: 3min 26s
Wall time: 3min 26s
In [27]:
plot_classes(5, y_train, X_train_augmented[0], shuffle_in_class=False)

Image Preparation for Deep Learning

In [28]:
X_train_prepared = prepare_images_for_deep_learning(X_train)
X_train_augmented_prepared = prepare_images_for_deep_learning(X_train_augmented)
X_test_prepared = prepare_images_for_deep_learning(X_test)
X_valid_prepared = prepare_images_for_deep_learning(X_valid)

Model Architecture

Below is a table of the architecture. It is the same as the LeNet-5 architecture with more depth in the filters and therefore more fully connected nodes.

Layer Description
Input 32x32x3 Image Data
Convolution 5x5 Valid Padding, 1x1 Strides, 28x28x16 Output
Relu
Max Pool 2x2 Strides, 14x14x16 Output
Convolution 5x5 Valid Padding, 1x1 Strides, 10x10x32 Output
Relu
Max Pool 2x2 Strides, 5x5x32 Output
Flatten 800 Output
Fully Connected 200 Output
Relu
Fully Connected 100 Output
Relu
Dropout 50% Keep Probability
Fully Connected Output 43

Architecture and training approach

I used the same architecture as the LeNet-5 net, as it was build for image classification tasks. Also most of the code got implemented earlier in the nanodegree so it seemed obvious to reuse the code.

To improve the accuracy of the net, I changed the depth of the filters and therefore I had to increase the number of fully connected neurons accordingly. At first I started with quite deep filters, and a lot of neurons. Then I decreased the filters step by step to prevent overfitting until I got good results.

I tried to create an overfitting model first, so I know that the model is able to learn the concepts. Then I tried to decrease overfitting by decreasing filter depths, adding a dropout layer with a 50% keep probability and image augmentation.

I used the adam optimizer with its default learning rate of 0.001. I tried to lower the training rate, but that seemed to increase training time without accuracy improvements. The adam optimizer should addapt the training rate anyways.

I also tried to increase the batch size until I run in memory issues, but that seemed to lower the accuracy somehow. So I kept the batch size at 512. The model ran for 30 Epochs, but it didn't improve after 20.

As a final result I got a validation accuracy of 0.943 and test accuracy of 0.95.

In the loss chart below the training segment you can see that validations loss starts to rise again after about 10 Epochs of training. That shows that the net is still overfitting, but I'm happy with the results it produces anyways.

In [29]:
import tensorflow as tf
from tensorflow.contrib.layers import flatten

def LeNet(x):    
    # Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
    mu = 0
    sigma = 0.1
    
    # SOLUTION: Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x16.
    conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 3, 16), mean = mu, stddev = sigma), name="conv1_W")
    conv1_b = tf.Variable(tf.zeros(16), name="conv1_b")
    conv1   = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b

    # SOLUTION: Activation.
    conv1 = tf.nn.relu(conv1, name="conv1_relu")

    # SOLUTION: Pooling. Input = 28x28x16. Output = 14x14x16.
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID', name="conv1_maxpool")

    # SOLUTION: Layer 2: Convolutional. Output = 10x10x32.
    conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 16, 32), mean = mu, stddev = sigma), name="conv2_W")
    conv2_b = tf.Variable(tf.zeros(32), name="conv2_b")
    conv2   = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    
    # SOLUTION: Activation.
    conv2 = tf.nn.relu(conv2, name="conv2_relu")

    # SOLUTION: Pooling. Input = 10x10x32. Output = 5x5x32.
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID', name="conv2_maxpool")

    # SOLUTION: Flatten. Input = 5x5x36. Output = 800.
    fc0   = flatten(conv2)
    
    #Layer 3: Fully Connected. Input = 800. Output = 200.
    fc1_W = tf.Variable(tf.truncated_normal(shape=(800, 200), mean = mu, stddev = sigma))
    fc1_b = tf.Variable(tf.zeros(200))
    fc1   = tf.matmul(fc0, fc1_W) + fc1_b
    
    # SOLUTION: Activation.
    fc1    = tf.nn.relu(fc1)

    # SOLUTION: Layer 4: Fully Connected. Input = 200. Output = 100.
    fc2_W  = tf.Variable(tf.truncated_normal(shape=(200, 100), mean = mu, stddev = sigma))
    fc2_b  = tf.Variable(tf.zeros(100))
    fc2    = tf.matmul(fc1, fc2_W) + fc2_b
    
    # SOLUTION: Activation.
    fc2    = tf.nn.relu(fc2)
    
    # Dropout
    drop = tf.nn.dropout(fc2, keep_prob)

    # SOLUTION: Layer 5: Fully Connected. Input = 100. Output = 43.
    fc3_W  = tf.Variable(tf.truncated_normal(shape=(100, 43), mean = mu, stddev = sigma))
    fc3_b  = tf.Variable(tf.zeros(43))
    logits = tf.matmul(drop, fc3_W) + fc3_b
    
    return logits

Train, Validate and Test the Model

A validation set can be used to assess how well the model is performing. A low accuracy on the training and validation sets imply underfitting. A high accuracy on the training set but low accuracy on the validation set implies overfitting.

Variables

In [30]:
x = tf.placeholder(tf.float32, (None, 32, 32, 3), name= "x")
y = tf.placeholder(tf.int32, (None), name= "y")
one_hot_y = tf.one_hot(y, 43, name= "one_hot_encoder")
keep_prob = tf.placeholder(tf.float32)

Training Pipeline

In [31]:
rate = 0.001

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=rate)
training_operation = optimizer.minimize(loss_operation)

Evaluation

In [32]:
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    total_loss = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
        loss = sess.run(loss_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0})
        total_accuracy += (accuracy * len(batch_x))
        total_loss += (loss * len(batch_x))

    return total_accuracy / num_examples, total_loss / num_examples

Training

In [33]:
from sklearn.utils import shuffle

EPOCHS = 30
BATCH_SIZE = 512
dropout = 0.5
In [34]:
%%time



with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    num_examples = len(X_train_prepared)
    
    train_losses = []
    validation_losses = []
    
    train_accs = []
    validation_accs = []
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        # Shuffle normal images.
        X_train_shuffled, y_train_shuffled = shuffle(X_train_prepared, y_train)
        
        # Shuffle augmentated images.
        for k in range(AUGMENTATION_RUNS):
            X_train_augmented_shuffled = np.zeros(X_train_augmented_prepared.shape)
            y_train_augmented_shuffled = np.zeros((AUGMENTATION_RUNS,) + y_train.shape)
            X_train_augmented_shuffled[k], y_train_augmented_shuffled[k] = shuffle(X_train_augmented_prepared[k], y_train)

        # Do Batch runs.
        for offset in range(0, num_examples, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train_shuffled[offset:end], y_train_shuffled[offset:end]
            
            # do a normal run.
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout})
            
            # do runs with augmentated images.
            for augmentation_run in range(AUGMENTATION_RUNS):
                augmented_batch_x, augmented_batch_y = X_train_augmented_shuffled[augmentation_run][offset:end], y_train_augmented_shuffled[augmentation_run][offset:end]
                sess.run(training_operation, feed_dict={x: augmented_batch_x, y: augmented_batch_y, keep_prob: dropout})
       
        # Calculate and print losses and accuracies.
        train_accuracy, train_loss = evaluate(X_train_prepared, y_train)
        validation_accuracy, validation_loss = evaluate(X_valid_prepared, y_valid)
        
        train_losses.append(train_loss)
        validation_losses.append(validation_loss)
        
        train_accs.append(train_accuracy)
        validation_accs.append(validation_accuracy)
        print("EPOCH {} ...".format(i+1))
        print("Train Accuracy = {0:.3f}".format(train_accuracy))
        print("Validation Accuracy = {0:.3f}".format(validation_accuracy))
        print("Validation Loss = " + str(validation_loss))
        print()
        
    saver.save(sess, './lenet')
    print("Model saved")
    
    pd.DataFrame({"Trainloss": train_losses, "Validationloss": validation_losses}).plot()
    pd.DataFrame({"Trainacc": train_accs, "Validationacc": validation_accs}).plot()
Training...

EPOCH 1 ...
Train Accuracy = 0.794
Validation Accuracy = 0.739
Validation Loss = 0.904380008614

EPOCH 2 ...
Train Accuracy = 0.951
Validation Accuracy = 0.871
Validation Loss = 0.413617065844

EPOCH 3 ...
Train Accuracy = 0.974
Validation Accuracy = 0.909
Validation Loss = 0.312673874427

EPOCH 4 ...
Train Accuracy = 0.986
Validation Accuracy = 0.920
Validation Loss = 0.253063216775

EPOCH 5 ...
Train Accuracy = 0.991
Validation Accuracy = 0.927
Validation Loss = 0.242943930154

EPOCH 6 ...
Train Accuracy = 0.992
Validation Accuracy = 0.940
Validation Loss = 0.217293340272

EPOCH 7 ...
Train Accuracy = 0.996
Validation Accuracy = 0.938
Validation Loss = 0.208018665817

EPOCH 8 ...
Train Accuracy = 0.993
Validation Accuracy = 0.933
Validation Loss = 0.232310384698

EPOCH 9 ...
Train Accuracy = 0.998
Validation Accuracy = 0.941
Validation Loss = 0.208919049477

EPOCH 10 ...
Train Accuracy = 0.997
Validation Accuracy = 0.945
Validation Loss = 0.196021937486

EPOCH 11 ...
Train Accuracy = 0.997
Validation Accuracy = 0.940
Validation Loss = 0.234045460204

EPOCH 12 ...
Train Accuracy = 0.999
Validation Accuracy = 0.946
Validation Loss = 0.200275571051

EPOCH 13 ...
Train Accuracy = 0.997
Validation Accuracy = 0.944
Validation Loss = 0.248334753584

EPOCH 14 ...
Train Accuracy = 0.998
Validation Accuracy = 0.945
Validation Loss = 0.229530826449

EPOCH 15 ...
Train Accuracy = 0.999
Validation Accuracy = 0.947
Validation Loss = 0.204539843164

EPOCH 16 ...
Train Accuracy = 0.999
Validation Accuracy = 0.948
Validation Loss = 0.232422652593

EPOCH 17 ...
Train Accuracy = 0.999
Validation Accuracy = 0.955
Validation Loss = 0.227606354152

EPOCH 18 ...
Train Accuracy = 0.999
Validation Accuracy = 0.947
Validation Loss = 0.248274665837

EPOCH 19 ...
Train Accuracy = 0.999
Validation Accuracy = 0.945
Validation Loss = 0.25914461876

EPOCH 20 ...
Train Accuracy = 0.999
Validation Accuracy = 0.948
Validation Loss = 0.231963134036

EPOCH 21 ...
Train Accuracy = 0.999
Validation Accuracy = 0.945
Validation Loss = 0.254336349235

EPOCH 22 ...
Train Accuracy = 0.999
Validation Accuracy = 0.949
Validation Loss = 0.279022761644

EPOCH 23 ...
Train Accuracy = 1.000
Validation Accuracy = 0.946
Validation Loss = 0.285295953566

EPOCH 24 ...
Train Accuracy = 1.000
Validation Accuracy = 0.947
Validation Loss = 0.311434539556

EPOCH 25 ...
Train Accuracy = 0.999
Validation Accuracy = 0.947
Validation Loss = 0.319325980721

EPOCH 26 ...
Train Accuracy = 0.999
Validation Accuracy = 0.946
Validation Loss = 0.271818849022

EPOCH 27 ...
Train Accuracy = 1.000
Validation Accuracy = 0.949
Validation Loss = 0.26734881184

EPOCH 28 ...
Train Accuracy = 1.000
Validation Accuracy = 0.950
Validation Loss = 0.296955858972

EPOCH 29 ...
Train Accuracy = 0.999
Validation Accuracy = 0.947
Validation Loss = 0.299036837332

EPOCH 30 ...
Train Accuracy = 0.999
Validation Accuracy = 0.951
Validation Loss = 0.328241604741

Model saved
CPU times: user 7min 55s, sys: 2min 39s, total: 10min 34s
Wall time: 11min 39s

Final test

In [65]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))

    test_accuracy, test_loss = evaluate(X_test_prepared, y_test)
    print("Test Accuracy = {:.3f}".format(test_accuracy))
Test Accuracy = 0.957

Missclassified Images

In [66]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    
    prediction = tf.argmax(logits,1)
    pred = sess.run(prediction, feed_dict = {x: X_test_prepared,  keep_prob: 1.0})
 
In [67]:
from sklearn.metrics import confusion_matrix

def plot_nth_most_confused(n, true_values, predicted_values):
    conf = confusion_matrix(true_values, predicted_values)
    np.fill_diagonal(conf, 0)

    flat = conf.flatten()
    flat.sort()

    real, pred = np.where(conf == flat[-n])
    real_class = np.where(y_test == real)[0][0]
    pred_class = np.where(y_test == pred)[0][0]

    plt.subplot(1, 2, 1)
    plt.imshow((X_test[real_class]))
    plt.subplot(1, 2, 2)
    plt.imshow((X_test[pred_class]))
    plt.suptitle("Real Class: " +  str(class_names.loc[real[0], "SignName"]) + "\nPrediction: " +  str(class_names.loc[pred[0], "SignName"]))

This are examples for the most confused classes from most confusions to less confusions. The signs are very similiar indeed.

In [68]:
plot_nth_most_confused(1, y_test, pred)
In [69]:
plot_nth_most_confused(2, y_test, pred)
In [70]:
plot_nth_most_confused(3, y_test, pred)
In [71]:
plot_nth_most_confused(4, y_test, pred)

Step 3: Test a Model on New Images

As I life in Germany, I drove around and took some images. I mounted my Phone behind my windshield to take the images. The windscreen was quite dirty, from time to time it was raining. So the images will be challenging. Also there is one 120 km/h speed limit image which was displayed with LEDs and the signs background is black instead of white. Im curious if that one gets recognized by the net.

Here is one example how the image was taken and for the dirty windshield in the corner :)

In [42]:
from scipy import misc
example = misc.imread("images/example.jpg")
plt.imshow(example)
Out[42]:
<matplotlib.image.AxesImage at 0x7ff809afd6a0>

Load and Output the Images

In [43]:
own_images = []
own_images.append(misc.imread("images/12.png"))
own_images.append(misc.imread("images/18-2.png"))
own_images.append(misc.imread("images/18.png"))
own_images.append(misc.imread("images/2.png"))
own_images.append(misc.imread("images/3.png"))
own_images.append(misc.imread("images/8-2.png"))
own_images.append(misc.imread("images/8.png"))
own_images = np.array(own_images)
own_images.shape
Out[43]:
(7, 32, 32, 4)
In [44]:
own_images = own_images[:, :, :, 0:3]
own_images.shape
Out[44]:
(7, 32, 32, 3)
In [45]:
own_classes = np.array([12, 18, 18, 2, 3, 8, 8])
In [46]:
for i in range(len(own_images)):
    plt.subplot(1, len(own_images), i + 1)
    plt.imshow(own_images[i])
    plt.axis('off')

Predict the Sign Type for Each Image

In [47]:
prepared = prepare_images_for_deep_learning(own_images)

with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    
    prediction = tf.argmax(logits,1)
    pred = sess.run(prediction, feed_dict = {x: prepared,  keep_prob: 1.0})
In [48]:
class_names_join = class_names.set_index("ClassId")
In [49]:
pd.DataFrame({"Prediction": pred, "True Class": own_classes})\
    .join(class_names_join, on=("Prediction")).rename(columns={"SignName": "SignName Prediction"})\
    .join(class_names_join, on=("True Class")).rename(columns={"SignName": "SignName True Class"})
Out[49]:
Prediction True Class SignName Prediction SignName True Class
0 12 12 Priority road Priority road
1 18 18 General caution General caution
2 18 18 General caution General caution
3 2 2 Speed limit (50km/h) Speed limit (50km/h)
4 3 3 Speed limit (60km/h) Speed limit (60km/h)
5 5 8 Speed limit (80km/h) Speed limit (120km/h)
6 8 8 Speed limit (120km/h) Speed limit (120km/h)

It confused some of the images. I really don't know how it confused class 18 with 17. To confuse 8 with 14 seems reasonable.

In [50]:
def plot_image_and_true_class(image, true_class):
    plt.subplot(1, 2, 1 )
    plt.imshow(image)
    plt.title("Image")
    plt.subplot(1, 2, 2 )
    plt.title("Prediction")
    plt.imshow(X_test[np.where(y_test == true_class)][0])
In [51]:
i = 0;
plot_image_and_true_class(own_images[i], pred[i])
In [52]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
In [53]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
In [54]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
In [55]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
In [56]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])
In [57]:
i += 1;
plot_image_and_true_class(own_images[i], pred[i])

Analyze Performance

The accuracy on these images:

In [58]:
from sklearn.metrics import accuracy_score
accuracy_score(own_classes, pred)
Out[58]:
0.8571428571428571

Output Top 5 Softmax Probabilities For Each Image Found on the Web

Below you can see the top 5 class propabilities for each of my own image. The model has lower probabilities on "general caution" sign, as there are very similiar signs. Also the led speed limit sign has low probabilities.

In [59]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    probs = tf.nn.top_k(tf.nn.softmax(logits), 5)
    
    pred_probs = sess.run(probs, feed_dict = {x: prepared,  keep_prob: 1.0})
In [60]:
float_formatter = lambda x: "%.2f" % x
np.set_printoptions(formatter={'float_kind':float_formatter})

for i in range(len(own_classes)):  
    predicted = pd.DataFrame({"Predicted" : pred_probs[1][i], "Probability": pred_probs[0][i]}).set_index("Predicted").join(class_names)
    predicted.plot.bar(x = "SignName", y = "Probability")
    plt.title("True Class: " + class_names.loc[own_classes[i], "SignName"])
    plt.ylim(0,1)

Step 4 (Optional): Visualize the Neural Network's State with Test Images

This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.

Provided for you below is the function code that allows you to get the visualization output of any tensorflow weight layer you want. The inputs to the function should be a stimuli image, one used during training or a new one you provided, and then the tensorflow variable name that represents the layer's state during the training process, for instance if you wanted to see what the LeNet lab's feature maps looked like for it's second convolutional layer you could enter conv2 as the tf_activation variable.

For an example of what feature map outputs look like, check out NVIDIA's results in their paper End-to-End Deep Learning for Self-Driving Cars in the section Visualization of internal CNN State. NVIDIA was able to show that their network's inner weights had high activations to road boundary lines by comparing feature maps from an image with a clear path to one without. Try experimenting with a similar test to show that your trained network's weights are looking for interesting features, whether it's looking at differences in feature maps from images with or without a sign, or even what feature maps look like in a trained network vs a completely untrained one on the same sign image.

Combined Image

Your output should look something like this (above)

In [61]:
### Visualize your network's feature maps here.
### Feel free to use as many code cells as needed.

# image_input: the test image being fed into the network to produce the feature maps
# tf_activation: should be a tf variable name used during your training procedure that represents the calculated state of a specific weight layer
# activation_min/max: can be used to view the activation contrast in more detail, by default matplot sets min and max to the actual min and max values of the output
# plt_num: used to plot out multiple different weight feature map sets on the same block, just extend the plt number for each new feature map entry

def outputFeatureMap(image_input, tf_activation, activation_min=-1, activation_max=-1 ,plt_num=1):
    # Here make sure to preprocess your image_input in a way your network expects
    # with size, normalization, ect if needed
    # image_input =
    # Note: x should be the same name as your network's tensorflow data placeholder variable
    # If you get an error tf_activation is not defined it may be having trouble accessing the variable from inside a function
    activation = tf_activation.eval(session=sess,feed_dict={x : image_input})
    featuremaps = activation.shape[3]
    plt.figure(plt_num, figsize=(15,15))
    for featuremap in range(featuremaps):
        plt.subplot(6,8, featuremap+1) # sets the number of feature maps to show on each row and column
        plt.title('FeatureMap ' + str(featuremap)) # displays the feature map number
        if activation_min != -1 & activation_max != -1:
            plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin =activation_min, vmax=activation_max, cmap="gray")
        elif activation_max != -1:
            plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmax=activation_max, cmap="gray")
        elif activation_min !=-1:
            plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", vmin=activation_min, cmap="gray")
        else:
            plt.imshow(activation[0,:,:, featuremap], interpolation="nearest", cmap="gray")

In this activations, you can clearly recognize the number, the round shape and that there is a ring. FeatureMap 8 seems to look for a dark sign in front of a bright background, but the background should not be important.

In [62]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
    
    reshaped_image = np.reshape(prepared[3], (1, 32, 32, 3)) 
    
    outputFeatureMap(reshaped_image, activation)

With this sign it is really hard to recognize the numbers and the shape of the sign.

In [63]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
    
    reshaped_image = np.reshape(prepared[4], (1, 32, 32, 3)) 
    
    outputFeatureMap(reshaped_image, activation)

This is the 120km/h sign displayed with leds, that got confused with another image. I wonder why, as the activation map doesn't seem too different to the other activation maps of speed limit signs.

In [64]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    activation = tf.get_default_graph().get_tensor_by_name("conv1_relu:0")
    
    reshaped_image = np.reshape(prepared[5], (1, 32, 32, 3)) 
    
    outputFeatureMap(reshaped_image, activation)

This is the seconds layer activiation. You can't really see a lot, as the resolution is very low. Maybe I should try to remove the maxpool layers... another time :)

In [72]:
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))
    activation = tf.get_default_graph().get_tensor_by_name("conv2_relu:0")
    
    reshaped_image = np.reshape(prepared[3], (1, 32, 32, 3)) 
    
    outputFeatureMap(reshaped_image, activation)
In [ ]: